Hexagon: enable HMX for GATED_DELTA_NET#57
Conversation
Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>
This patch implements HMX acceleration for the GATED_DELTA_NET operation in the Hexagon backend.
It modifies
op_gated_delta_netto conditionally dispatch tohmx_gated_delta_net_extwhen HMX is enabled and dimensions are compatible (S_v % 32 == 0).The HMX logic allocates necessary F16 VTCM buffers, converts standard F32 state matrices to F16 column-major tiled data across worker threads, dispatches
Q6_execution using the HMX software queue (which handlesstate * gateequivalent logic structure via dot products on permuted columns), and extracts theattn_dataefficiently.PR created automatically by Jules for task 4030593511233830126 started by @max-krasnyansky